AITopics

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (0.46)
Overview (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsApr-24-2026, 09:34:29 GMT

Counterfactual Explanations Can Be Manipulated

Counterfactual explanations are emerging as an attractive option for providing recourse to individuals adversely impacted by algorithmic decisions. As they are deployed in critical applications (e.g. law enforcement, financial lending), it becomes important to ensure that we clearly understand the vulnerabilties of these methods and find ways to address them. However, there is little understanding of the vulnerabilities and shortcomings of counterfactual explanations. In this work, we introduce the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated. More specifically, we show counterfactual explanations may converge to drastically different counterfactuals under a small perturbation indicating they are not robust. Leveraging this insight, we introduce a novel objective to train seemingly fair models where counterfactual explanations find much lower cost recourse under a slight perturbation. We describe how these models can unfairly provide low-cost recourse for specific subgroups in the data while appearing fair to auditors. We perform experiments on loan and violent crime prediction data sets where certain subgroups achieve up to 20x lower cost recourse under the perturbation. These results raise concerns regarding the dependability of current counterfactual explanation techniques, which we hope will inspire investigations in robust counterfactual explanations.1

artificial intelligence, counterfactual explanation, natural language, (17 more...)

Country: North America > United States (1.00)

Genre: Research Report (0.46)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.89)

Technology: Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)

Neural Information Processing SystemsFeb-8-2026, 04:14:28 GMT

14cdc9013d80338bf81483a7736ea05c-Paper-Conference.pdf

explanation, graph neural network, substructure, (13 more...)

Country:

Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Macao (0.04)

Genre:

Research Report > New Finding (0.46)
Overview (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsDec-23-2025, 21:37:43 GMT

AttCAT: Explaining Transformers via Attentive Class Activation Tokens

Transformers have improved the state-of-the-art in various natural language processing and computer vision tasks. However, the success of the Transformer model has not yet been duly explained. Current explanation techniques, which dissect either the self-attention mechanism or gradient-based attribution, do not necessarily provide a faithful explanation of the inner workings of Transformers due to the following reasons: first, attention weights alone without considering the magnitudes of feature values are not adequate to reveal the self-attention mechanism; second, whereas most Transformer explanation techniques utilize self-attention module, the skip-connection module, contributing a significant portion of information flows in Transformers, has not yet been sufficiently exploited in explanation; third, the gradient-based attribution of individual feature does not incorporate interaction among features in explaining the model's output. In order to tackle the above problems, we propose a novel Transformer explanation technique via attentive class activation tokens, aka, AttCAT, leveraging encoded features, their gradients, and their attention weights to generate a faithful and confident explanation for Transformer's output. Extensive experiments are conducted to demonstrate the superior performance of AttCAT, which generalizes well to different Transformer architectures, evaluation metrics, datasets, and tasks, to the baseline methods.

artificial intelligence, natural language, proceedings, (8 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Jeon, Minseok, Park, Seunghyun

GDLNN: Marriage of Programming Language and Neural Networks for Accurate and Easy-to-Explain Graph Classification

arXiv.org Artificial IntelligenceOct-2-2025

GDLNN combines a domain-specific programming language, called GDL, with neural networks. The main strength of GDLNN lies in its GDL layer, which generates expressive and interpretable graph representations. Since the graph representation is interpretable, existing model explanation techniques can be directly applied to explain GDLNN's predictions. Our evaluation shows that the GDL-based representation achieves high accuracy on most graph classification benchmark datasets, outperforming dominant graph learning methods such as GNNs. Applying an existing model explanation technique also yields high-quality explanations of GDLNN's predictions. Furthermore, the cost of GDLNN is low when the explanation cost is included. In graph classification, graph representation learning holds the key to success. The goal of this task is to learn useful feature representations (embeddings) for the entire graph that effectively capture its key structure and properties. These representations are utilized in various graph machine learning tasks, including graph classification. Beyond predictive performance, learning interpretable graph representations has become increasingly important, as it directly impacts model explainability, crucial in decision-critical domains such as drug discovery (Kakkad et al., 2023).

artificial intelligence, machine learning, representation, (18 more...)

2510.00374

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Chauhan, Sam, Duguet, Estelle, Ramakrishnan, Karthik, Van Deventer, Hugh, Kruger, Jack, Subbaraman, Ranjan

SHLIME: Foiling adversarial attacks fooling SHAP and LIME

arXiv.org Artificial IntelligenceAug-18-2025

Post hoc explanation methods, such as LIME and SHAP, provide interpretable insights into black-box classifiers and are increasingly used to assess model biases and generalizability. However, these methods are vulnerable to adversarial manipulation, potentially concealing harmful biases. Building on the work of Slack et al. (2020), we investigate the susceptibility of LIME and SHAP to biased models and evaluate strategies for improving robustness. We first replicate the original COMPAS experiment to validate prior findings and establish a baseline. We then introduce a modular testing framework enabling systematic evaluation of augmented and ensemble explanation approaches across classifiers of varying performance. Using this framework, we assess multiple LIME/SHAP ensemble configurations on out-of-distribution models, comparing their resistance to bias concealment against the original methods. Our results identify configurations that substantially improve bias detection, highlighting their potential for enhancing transparency in the deployment of high-stakes machine learning systems.

artificial intelligence, classifier, machine learning, (15 more...)

2508.11053

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.42)
Government > Military (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJun-10-2025

ExplainBench: A Benchmark Framework for Local Model Explanations in Fairness-Critical Applications

Afful, James

As machine learning systems are increasingly deployed in high-stakes domains such as criminal justice, finance, and healthcare, the demand for interpretable and trustworthy models has intensified. Despite the proliferation of local explanation techniques, including SHAP, LIME, and counterfactual methods, there exists no standardized, reproducible framework for their comparative evaluation, particularly in fairness-sensitive settings. We introduce ExplainBench, an open-source benchmarking suite for systematic evaluation of local model explanations across ethically consequential datasets. ExplainBench provides unified wrappers for popular explanation algorithms, integrates end-to-end pipelines for model training and explanation generation, and supports evaluation via fidelity, sparsity, and robustness metrics. The framework includes a Streamlit-based graphical interface for interactive exploration and is packaged as a Python module for seamless integration into research workflows. We demonstrate ExplainBench on datasets commonly used in fairness research, such as COMPAS, UCI Adult Income, and LendingClub, and showcase how different explanation methods behave under a shared experimental protocol. By enabling reproducible, comparative analysis of local explanations, ExplainBench advances the methodological foundations of interpretable machine learning and facilitates accountability in real-world AI systems.

explanation, machine learning, natural language, (18 more...)

2506.0633

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Law (0.49)
Health & Medicine (0.49)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Maeß, Johannes, Montavon, Grégoire, Nakajima, Shinichi, Müller, Klaus-Robert, Schnake, Thomas

Uncovering the Structure of Explanation Quality with Spectral Analysis

arXiv.org Artificial IntelligenceApr-14-2025

As machine learning models are increasingly considered for high-stakes domains, effective explanation methods are crucial to ensure that their prediction strategies are transparent to the user. Over the years, numerous metrics have been proposed to assess quality of explanations. However, their practical applicability remains unclear, in particular due to a limited understanding of which specific aspects each metric rewards. In this paper we propose a new framework based on spectral analysis of explanation outcomes to systematically capture the multifaceted properties of different explanation techniques. Our analysis uncovers two distinct factors of explanation quality-stability and target sensitivity-that can be directly observed through spectral decomposition. Experiments on both MNIST and ImageNet show that popular evaluation techniques (e.g., pixel-flipping, entropy) partially capture the trade-offs between these factors. Overall, our framework provides a foundational basis for understanding explanation quality, guiding the development of more reliable techniques for evaluating explanations.

explanation, machine learning, natural language, (20 more...)